ProGmatica: a Prosodic and Pragmatic Database for European Portuguese
نویسندگان
چکیده
In this work, a spontaneous speech corpus of broadcasted television material in European Portuguese (EP) is presented. We decided to name it ProGmatica as it is meant to combine prosody information under a pragmatic framework. Our purpose is to analyse, describe and predict the prosodic patterns that are involved in speech acts and discourse events. It is also our goal to relate both prosody and pragmatics to emotion, style and attitude. In future developments, we intend, by this way, to provide EP TTS systems with pragmatic and emotional dimensions. From the whole recorded material we selected, extracted and saved prototypical speech acts with the help of speech analysis tools. We have a multi-speaker corpus, where linguistic, paralinguistic and extra linguistic information are labelled and related to each other. The paper is organized as follows. In section one, a brief state-of-the-art for the available EP corpora containing prosodic information is presented. In section two, we explain the pragmatic criteria used to structure this database. Then, we describe how the speech signal was labelled and which information layers were considered. In section three, we propose a prosodic prediction model to be applied to each speech act in future. In section four, some of the main problems we went through are discussed and future work is presented.
منابع مشابه
Progmatica: A Prosodic Database for European Portuguese
In this work, a spontaneous speech corpus of broadcasted television material in European Portuguese (EP) is presented. We decided to name it ProGmatica as it is meant to combine prosody information under a pragmatic framework. Our purpose is to analyse, describe and predict the prosodic patterns that are involved in speech acts and discourse events. It is also our goal to relate both prosody an...
متن کاملOn the Use of Prosodic Labelling in Corpus-Based Linguistic Studies of Spontaneous Speech
This paper addresses the construction of a spontaneous speech corpus in European Portuguese (hereafter EP), the corpus is presented and a prosodic labeling scheme that is here proposed is explained. The objective of this work is to provide a tool for linguistic analysis suitable to several research topics, which have speech and dialogue as objects. The main features considered in the database w...
متن کاملAffirmative constituents in European Portuguese dialogues: prosodic and pragmatic properties
This paper investigates the correlation between the prosodic properties and pragmatic functions of affirmative constituents in adult-adult interactions in European Portuguese (CORAL corpus). 515 affirmative constituents produced in 460 answers, extracted from 11 dialogues between 12 speakers, were analyzed. Results show that: i) sim ‘yes’, ok and grunts are the most frequent affirmative constit...
متن کاملExtending AuToBI to prominence detection in European Portuguese
This paper describes our exploratory work in applying the Automatic ToBI annotation system (AuToBI), originally developed for Standard American English, to European Portuguese. This work is motivated by the current availability of large amounts of (highly spontaneous) transcribed data and the need to further enrich those transcripts with prosodic information. Manual prosodic annotation, however...
متن کاملMusic and speech in early development: automatic analysis and classification of prosodic features from two Portuguese variants
In the present study, we aim to capture rhythmic and melodic patterning in speech and singing directed to infants. We address this issue by exploring the acoustic features that best predict different classification problems. We built a database composed by infant-directed speech from two Portuguese variants (European vs Brazilian Portuguese) and infant-directed singing from the two cultures, co...
متن کامل